m/IB IU / 0 3 6 7 7 
C 0 4. II. Q4i 




Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 




Bescheinigung Certificate 



Attestation 



Die angehefteten Unter la- 
gen stimmen mit der 
ursprQnglich eingereichten 
Fassung der auf dem nach- 
sten Blatt bezeichneten 
europaischen Paten tan mel- 
dung Qberein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed. 



Les documents fixes a 
cette attestation sont 
conformes a la version 
initialement deposee de 
la demande de brevet 
europeen specifiee a la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n° 

03300223.9 



PRIORITY ^DOCTMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH 
RULE 17.1(a) OR (b) 



Der President des Europaischen Patentarnts; 
Im Auftrag 

For the President of the European Patent Office 

Le President de 1'Office europeen des brevets 
p.o. 



R C van Dijk 

bfcb i mv>mLA!3L£ COPY 



EPA/EPO/OEB Form 1014.1 - 02.2000 7001014 




Europaisches 
Patentamt 



European 
Patent Office 



Office europeen 
des brevets 



Anmel dung Nr: 
Application no. : 
Demande no: 



03300223.9 



Anmel detag: 
Date of filing: 
Date de depot: 



24.11.03 



Anmel der/Appl IcantC s)/Demandeur( s) : 

Koninklijke Philips Electronics N.V. 
Groenewoudseweg 1 
5621 BA Eindhoven 
PAYS -B AS 



Bezelchnung der Erf 1ndung/T1 tie of the 1nvent1on/Tl tre de V Invention: 
(Falls die Bezelchnung der Erflndung nlcht angegeben 1st, slehe Beschrel bung. 
If no title Is shown please refer to the description. 
S1 aucun tltre n'est 1nd1qu6 se referer a la description.) 

A method and system for the detection and segmentation of local visual space- 
time details in a video signal 

In Anspruch genommene Prlorlat(en) / Priori tyCles) claimed /PHor1 1<§( s) 
revendlquee(s) 

Staat/Tag/Aktenze1chen/State/Date/F1le no./Pays/Date/Numero de depot: 



Internationale Patentklasslf 1 katl on/International Patent Classification/ 
Classification Internationale des brevets: 



Am Anmel detag benannte Vertragstaa ten/Contracting states designated at date < 
flHng/Etats contractants designees lors du depot: 

AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MG NL 
PT RO SE SI SK TR LI 



03300223.9 

EPA/EP0/0EB Form 1014.2 - 01.2000 7001014 



2 



A METHOD AND SYSTEM FOR THE DETECTION AND SEGMENTATION OF LOCAL VISUAL 
SPACE-TIME DETAILS IN A VIDEO SIGNAL 

Field of the invention 

5 The present invention relates to the field of video signal processing such as for TV or DVD 
agnate. More specifically, the invention relates to methods for detection and segmentation 
of local visual space-time details in video signals. In addition, the invention relates to 
systems for detection and segmentation of local visual space-time details in video signals. 

10 Background of the invention 

Data compression of video signal with a stream of images (frames) has become 
widespread since a large amount of channel or storage capacity can be saved in 
transmission of digital video data such as for TV or DVD. Specified standards such as MPEG 
and H.26x provide a high degree of data compression using block-based motion 
15 compensation techniques. Normally, macro-blocks of 16x16 pixels are used for 

representation of motion information. For many normal video signals these compression 
techniques provide a high data compression rate without suffering from any visual artefact 
that is perceptible by the human eye. 

20 However, the standard compression schemes are known not to be transparent i e for 
certain video signals they give rise to visual artefacts. Such visual artefacts occur in case 
the video signal includes motion pictures including local space-time details. Local space- 
time details are represented by spatial texture that varies its local characteristics in time in 
an indefinite manner. Examples are motion pictures of fire, wavy water, rising steam 

25 leaves fluttering in the wind etc. In these cases the motion picture information 

representation by 16x16 pixel macro-blocks offered by the compression schemes is too 
coarse to avoid loss of visual information. This is a problem in relation to achieve optimal 
high quality video reproduction in combination with the benefits of MPEG or H.26x 
compression with respect to bit rate reduction. 

30 

In order to avoid visual artefacts in a video signal intended for compression, it is necessary 
to detect local space-time details that may cause visual artefacts by compression prior to 
applying the compression procedure. Having located these parts in the video signal it is 
possible to apply a special processing to these parts so as to avoid artefacts being 
35 introduced by the compression procedure. Methods for detecting and indicating image 
blocks of a video signal that include space-time details are known. 

EP 0 571 121 Bl describes an image processing method being an elaboration of the known 
so-called Horn-Schunk method. This method is described in B. K. Horn, and B G Schunck 
40 "Determining Optical Flow", Artificial Intelligence, Vol. 17, 1981, pp. 185-204. The Horn- ' 
Schunk method includes extraction of pixel-wise image velocity information called optical 
flow. For each single image an optical flow vector is determined, and a condition number is 
computed based on this vector. In EP 0 571 121 Bl a local condition number is computed 



based on the optical flow vector for each image, the goa. being to obtain a robust optical 



flow. 



FP i 233 373 Al describes a method for segmentation of fragments of an image exhibiting 
5 sim la 'ties in Carlo" visual attributes. Various criteria are described for combining small 
Z on* "an image into larger regions exhibiting similar characteristics within a 
predetermined threshold. In relation to detection of motion an affine motion mode, is used 
which implies calculation of optical flow. 

10 US 6 456 731 Bl describes a method for estimation of optica, flow and an image synthesis 
method ^e described estimation of optica, flow is based on the known Lucas-Kanade 

Inn d^cribeTm B D Lucas, and T. Kanade, "An iterative image registration technique 
wTttTan apXl to stereo vision", Proceedings of the 7th International^ Conference 
ArtmciaUnte.Hgence, 1981, Vancouver, pp. 674-679. The ^s-Kanade method 

neighbourhood of a p.xeU TOe 1 I ay ^ ^ ^ , ve , oc|ty 

In EP 0 571 12! Bl, it performs the step of computing optica, flow, and subsequently the 
step of image registering. 

Summary of the invention 

k«~+ «f fh P nresent invention to provide a method of detecting local 
25 " "7£Z£S t& £Z£^*m * simpie to implement and it « 
Z ZZ ZZ^Zl low cost eguipment. By space-Brno details of an Image ,s 

utderSod ^ageUions containing a large spatia, brightness variation that exhibit 
arongtmpora^angea at the local level, wherein a velocity of these spatial parts are 
30 weakly correlated in time. 

A first aspect of the present invention provides a method of detecting tocal space-tlnne 
d«a^ of atldeo signal representing a plurality of Images, the method comprise, for 
each image, the steps of: 

35 A) dividing the ^"^^ tTto* one pixel wlthin each of said one 
B) calculating at least caespae^ 

C, cTuia'd^r each of the one or more biodts at least one : stadsdca. parameter for eacb 
. u HmP features calculated within the block, and 

«0 ZZ^T»Zs^r£T£* one stadstica, pa ra mater exceeds a predetermined 

level. 

p~*» rah lv the at least one space-time feature comprises visual normal flow magnitude 
EXE^nSSL direction. The visual norma, flow represents the component of 
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the optical flow that is parallel to image brightness spatial gradient. The at least one 
space-time feature may further comprise visual normal acceleration magnitude and/or 
visual normal acceleration direction. Visual normal acceleration describes temporal 
variation of the visual normal flow along the normal (image brightness gradient) direction. 

5 

Preferably, the method further comprises the steps of calculating horizontal and vertical 
histograms of the at least one space-time feature calculated in step C). 

The at least one statistical parameter of step D) may comprise one or more of: variance, 
10 average, and at least one parameter of a probability function. The bfock(s) of pixels are 
preferably non-overlapping square blocks, and their size may be: 2x2 pixels, 4x4 pixels, 
6x6 pixels, 8x8 pixels, 12x12 pixels, or 16x16 pixels. 

The method may further comprise the step of pre-processing the image prior to applying 
15 step A), so as to reduce noise in the image, this pre-processing preferably comprising the 
step of convolving the image with a low-pass filter. 

The method may further comprise an intermediate step between step C) and D), the 
intermediate step comprising calculating at least one inter-block statistical parameter 
20 involving at least one of the statistical parameter calculated for each block. The at least 
one inter-block statistical parameter may be calculated using a 2-D Markovian non-causal 
neighbourhood structure. 

The method may further comprise the step of determining a pattern of temporal evolution 
25 for each of the at least one statistical parameter calculated in step C). The method may 
further comprise the step of indexing at least part of an image comprising one or more 
blocks detected in step D). Furthermore, the method may comprise the step of increasing 
data rate allocation to the one or more blocks detected in step D). In another embodiment 
the method may further comprise the step of inserting an image in a de-interladng 
30 system. 

A second aspect of the invention provides a system for detecting local space-time details o 
a video signal representing a plurality of images, the system comprising: 

- means for dividing an image into one or more blocks of pixels, 

35 - space-time feature calculating means for calculating at least one space-time feature for 
at least one pixel within each of the one or more blocks, 

- statistical parameter calculating means for calculating for each of the one or more blocks 
at least one statistical parameter for each of the at least one space-time features 
computed within the one or more blocks, and 

40 - detecting means for detecting one or more blocks wherein the at least one statistical 
parameter exceeds a predetermined level. 

A third aspect of the invention provides a device comprising a system according to the 
system of the second aspect. 
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15 

and 

20 



A fourth aspect of the invention provides a signal processor system programmed to 
operate according to the method of the First aspect. 

A fifth aspect of the invention provides a de-interlacing system for a i televisl ior, . OV) 
apparatus, the de-interiacing system operating according to the method of the first aspect. 

A sixth aspect provides a video signal encoder for encoding a video signal representing a 
plurality of images, the video signal encoder comprising: 

attest one statistics, parameter for each of the a. leas, one space-ome features 

rzrzrzzzzzzz — — * - — — 

"means for adjusting the q uan,lsa«on scale for the one or more blocks In accordance w„h 
the at least one statistical parameter. 

A seventh aspect provides a video signal representing a plurality of Images the video 
signa^mprLg information regarding image segments exhibiting space-fme detaiis 
suitable for use with the method of the first aspect. 
25 An eighth aspect provides a video storage medium comprising video signal data according 
to the seventh aspect, 

A ninth aspect provides a computer useable medium having a computer readab.e program 
" * H ' . „. oroin th ~ romouter readable program code comprising: 

ZZ £Z£ the computer to calculate at least one space-tlrae feature for at least 

°ZntrrusCthe Smputer to calodate for each of the b,ocks a. least one statistical 
35 iEHSiSSSi the atLst one space-time features calculated „,th,n the one or 

TZZZZL. the computer ~ detect blocks therein the at least one statical 
parameter exceeds a predetermined level. 
*n A tenth aspect provides a video signal representing a plurality of images, the video signal 
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An eleventh aspect provides a method of processing a video signal, wherein the method of 
processing comprises the method of the first aspect. 

5 A twelfth aspect provides an integrated circuit comprising means for processing a video 
signal according to the method of the first aspect. 
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A thirteenth aspect provides a program storage device readable by a machine and 
encoding a program of instructions for executing the method of the first aspect. 

Brief description of drawings 

In the following the invention is described in details with reference to the accompanyinq 
figures, wherein M 

15 Fig. 1 shows an illustration of normal and tangential flows at two points of a contour 
moving with uniform velocity, 

Fig. 2a shows an example of an image of two persons and a fountain basin includina 
splashing water, 

20 

Fig. 2b shows a grey scale plot representing for the image of Fig. 2a a block-wise level of 
normal flow variance, wherein white blocks indicate blocks calculated to have a high level 
of normal flow variance, 

25 Fig. 3 shows a flow diagram of a system according to the present invention, and 
Fig. 4 shows an example of a normal flow variance histogram. 

While the invention is susceptible to various modifications and alternative forms, specific 
30 embodiments have been shown by way of example in the drawings and will be described in 
detail herein. It should be understood, however, that the invention is not intended to be 
lim.ted to the particular forms disclosed. Rather, the invention is to cover all modifications 
equivalents, and alternatives falling within the scope of the invention as defined by the ' 
appended claims. 

35 

Detailed description of the invention 

According to an embodiment of the present invention the major operations to be carried 
out for processing an image are the steps: 



40 A) Divide image into blocks 

B) Estimate local feature(s) 

C) Calculate feature statistics per block 



6 



Step A) of processing an image is to divide the image into blocks. Preferab^ the blocks 
coincide with macro blocks used by standard compression such 

Therefore, the image is preferably divided into non-overlapp.ng blocks of 8x8 p.xels or 
l6 xl6 pixe.s. The image blocks, when 8x8 pixe.s .arge and when they are ahgned with the 
5 (MPEG) image grid, coincide with typical I-frame DCT/IDCT computation and descnbe 
spatial details information. When 16x16 pixe.s large and when they are aUgned w,th the 
(MPEG) image grid, coincide with P-frame (B-frame) macro blocks for domg mobon 
compensation (MC) in block-based motion estimation in MPEG/H.26x video standards, and 
this allows to describe spatio-temporal details information. 

10 Step B) comprises estimating at least one local feature, the local feature relating to spatial, 
temporal, and/or spatio-temporal detai.s of the image. Preferably, two features are used 
oglher with different associated metrics. The estimation of local features ,s based on a 
combination of spatial and tempora. image brightness gradients. The preferred features 

15 are visual normal flow, i.e. and visual normal velocity and visual normal accelerate. The 
f eature may be based on either or both of visual norma, velocity and visual norma, 
a^eleration. For the case of visual norma, velocity two consecutive frames (or .mages) are 
used while for the visual normal acceleration three consecutive frames (or .mages) are 
necessary. A more thorough description of visual normal velocity and visual normal 

20 acceleration is given in the following. 

Step C) comprises calculating a per block feature statistics. This includes the computation 
of feature average and variance. Also, different probability density functions are matched 
to this per block statistics. The per block statistics provides information so as to set up 
25 nrt ho'd or criteria allowing a categorisation of each block with respect to the amount of 
space-time details. Thus, the per block statistics allows detection of blocks w.th a h.gh 
amountTf space-time detai.s, since such b.ocks exhibit per blocks statistical parameters 
exceeding predetermined thresholds. 

30 The visual normal flow represents the component of the optica, flow that is parallel to 
Unage brightness spatial gradient. Optica, flow is the most detailed velocity 
that can be extracted .ocally by processing two successive frames or v,deo fields, but ,t is 
computationally expensive to extract. The normal flow, on the other side, ,s easy to 
compute and it is very rich in local spatial and temporal information. For example 

35 emulation of optica, flow requires typically 7x7x2 space-time neighbourhoods wh. e 
^a. flow requires only 2x2x2 neighbourhoods. In addition, ca.cu.at.on of opfca. flow 
requires an optimisation, while calculation of normal flow does not. 

The normal flow magnitude determines the amount of motion parallel- to the local image 
40 brightness gradient and the normal flow direction describes the local image bnghtness 
orientation. Visual normal flow is calculated from: 

a/(*>*') +v x j£0^) + ^(^) = o t 

V * X dx y ay 3/ 
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where I is brightness, x and y are spatial variables, and t is the time variable. The normal 
flow direction encodes implicitly spatial variation of image brightness gradient and 
therefore spatial texture information. The normal acceleration describes, as a second orde 
effect, how the normal flow varies locally. 

Visual normal flow is defined as the normal, I.e. parallel to the spatial image gradient, 
component of the local image velocity or optical flow. The image velocity can be 
decomposed, at each image pixel, into normal and tangential components. 



10 Fig. 1 shows, for illustration, a well-defined image boundary or contour that passes the 
target pixel of an image. The diagram in Fig. 1 shows the normal and tangential flows at 

two points of a contour moving with uniform velocity V . Going from point A to point B, th 
normal and tangential Image velocities (normal flow and tangential flow, respectively) 
change their spatial orientation. This indeed happens from point to point due to contour 

15 curvature. The normal and tangential flows are always 90' apart. 



An important property of the normal flow is that this in the only image velocity component 
that can be locally computed in the image. The tangential component can not be 
computed. In order to explain this, it can be assumed that the image brightness /(•, ;■) is 
20 constant when image point P(jc,j/) at timer moves to position PXx\y*) at 

timeA/ t' = t + At, were (x\y) = (x,y) + V- At . The image velocity is considered to be 
constant and At is "small". Therefore, 

25 or 

V - Vi(x, y 9 1) + — — =* 0 (2) 
at 

were x ~ 'means approximate and V s • Since V = V n +V t and V t • V = 0 , 

(2) reduces to: 

30 V n ' VI{x^i) + =0. (3) 



This means that: 



35 



with 



K=n\v n \, 



VJ(x,jM) 



(4) 



(5) 
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|v/(v;)| 



(6) 



The normal flow, in distinction to the image velocity, is also a measure of local image 
5 brightness gradient orientation, and this measures implicitly includes the amount of spatial 
shape variability, e.g. curvature, texture orientation, etc. 

Preferably, two different methods may be used to compute the normal flow in discrete 
images/[i']L/][A:]. One method is the 2x2x2 brightness cube method is described in B.K.P. 
10 Horn, Robot Vision, The MIT Press, Cambridge, Massachusetts, 1986. Another method is 
the feature based method. 

In the 2x2x2 brightness cube method the spatial and temporal derivatives are 
approximated according to (7)-(9). 



#(W^«^ X [(/M[|>3[fc+l]+^ 



These discrete derivatives are computed inside the cells of a 2x2x2 brightness cube. 

25 The feature based method is based on the following steps: 

(a) Finding image-points with high-spatial gradients. This is implemented by: (i) 

smoothing the image /(•,•;•) by applying to it a binomial approximation to a Gaussian 
function; (ii) computing the discretised spatial image gradients 



(7) 



a/(^;^^ xK/[ll[ (y +TO ^ 
(W][*]+^OM[*+i]+/['+l]mt*]+/['+l]l>*3[*+l3)] 



(8) 



(9) 



30 




d %y " Yl ' -/ KIL/ _ flftO ' OH) finding the subset of image points 

for which jV/(-, ;-)j is larger than a pre-determined threshold T Gr . Also, use 
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^At ~ Yl * + 13 - IflUlt* - ! ]) / whlch involves three instead of two 

successive frames. 

(b) The normal flow is computed interactively at each feature position, e.g. point with 
"high" spatial gradient, by using the discrete version of (5) and (6). First, with the 
5 initial computation of the normal flow, the local image is warped according to it to 

refine the normal flow value. From the residual temporal derivative the residual 
normal flow is computed and the initial normal flow estimate is updated. This is 
repeated until the residual normal flow is smaller thane (e.g. 0001). 

10 Normal acceleration describes temporal variation of the normal flow along the normal 
(image brightness gradient) direction. Its importance is due to the fact that the 
acceleration measures how much the normal flow varies between, at least three successive 
frames, and thus making it enables to determine how much the space-time details vary 
between pairs of frames. 

15 

One way to define the normal acceleration is by taking the temporal derivative of (3): 



3/ 



20 

so that: 



and 



A\ = 



n 



(10 



(12) 



iv/(x,j/,or 

25 

Because of the second temporal derivative in (12), it is necessary to use a minimum of 
three successive frames when implementing (12). Taking a 3x3x3 pixels wide cube to 
compute the discretised versions of the derivatives in (12), it can be shown that: 

(13! 



The other discretised derivatives can be obtained to (7)-(9) on the 3x3x3 cube. 
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The goal of computing feature statistics is to detect space-time regions were a given 
feature varies most - the segmentation and detection of high space-time details. This may 
be implemented according to the following algorithm, given two (three) successive images: 



5 1. Dividing the image into non-overlapping (square or rectangular) blocks, 

2. Computing within each block a local feature set, 

3. Determining, for each block, the average of the feature set computed in 2., and 

4. Computing the variance, average variation of each feature within each block from 
the variance computed in 3., 

10 5. Given a threshold^ , selecting a set of blocks for which the variance computed in 

4. is larger than T . 



In our implementation of the algorithm we choose square (8x8 or 16x16) blocks. This 
will tessellate the image into square blocks, and the remainder of it will be left 
15 untessellated; in order to reduce this residual untessellated image region a rectangular 
tessellation could be used, but this is not so interesting because we want to align these 
blocks with MPEG 8x8 (OCT) or 16x16 (MC) blocks for visual artefact pre-detection. The 
computation of feature values within each block is implemented either at each pixel, for 
which jV/( v ;0| is larger than a pre-determined threshold T , or at feature points for which 

20 |V/( V ;0| Is larger than a pre-determined threshold T Gr ; usually T < T Gr . The statistics 

exemplified in steps 4. and 5. are just an illustration. More detailed statistics could be 
computed. Also, specific probability distribution densities (pdf) and their statistics couid be 
computed. 



25 In order to make the computations according to the above-mentioned or related 
implementations more robust, a set of pre- and post -processing operations may be 
applied. An example of pre-processing is to convolve the input images with low-pass 
filters. Post-processing may include, for example, comparing neighbour blocks with respect 
to their statistics, e.g. feature variance. 

30 

Fig. 2a shows an example of an image taken from a sequence of images. In the image two 
persons are watching splashing water in a fountain basin. One of the persons is partly 
behind the splashing water. Such an image therefore includes local parts exhibiting an 
example of a phenomenon expected t& produce a chaotic brightness pattern, namely the 
35 splashing water. Therefore, the image is taken from a moving image sequence with the 
potential of a high amount of local space-time details. The image has been processed 
according to the present invention in blocks, and for each block a variance of normal flow 
magnitude has been calculated as a measure representing the amount of space-time 
details. 

40 

In Fig. 2b the blocks of the image of Fig. 2a are shown in a grey scale indicating normal 
flow magnitude variance and thereby indicates the amount of local space-time details. 
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White coloured blocks indicate regions with a high level of normal flow variance, whereas 
dark grey blocks indicate regions with a low level of normal flow variance. As seen from 
Fig. 2b white blocks appear in parts of the image with splashing water and thus these loc< 
image regions are found to exhibit a large amount of local space-time details according to 
5 the processing method- The steady Image regions, such as the person to the left and the 
fountain basin to the right, are seen to be dark grey, indicating that these regions are 
detected to exhibit a low normal flow variance. 

Hg. 3 show a flow diagram structure of a system for processing space-time details 
10 information. The system sketched in Fig. 3 can be used for different applications by using 
different of paths A, B and C indicated in the flow diagram. The elements of Fig. 3 are: 

VI: Video Input 
Pre-P: Pre-processing 
15 STDE: Space-time detail estimation and detection 
Post-P: Post-processing 
VQI: Visual quality improvement 
Disp: Display 
St: Storage medium 

20 

Video input of Fig. 3 represents a video signal representing a sequence of images. The 
video input may either be applied directly, such as by a wire or wireless, or as indicated in 
Fig. 3, the video signal may be stored on a storage medium before being processed. The 
storage medium may be a hard disk, a writeable CD, a DVD, computer memory etc. Input 
25 may either be a compressed video format, such as MPEG or H.26x, or it may be a non- 
compressed signal, i.e. a full resolution representation of the video signal. If an analog 
video signal is input, the VI step may include an analog to digital conversion. 

Pre-processing of Fig. 3 is optional. If preferred, various signal processing may be applied 
30 in order to reduce noise or other visual artefacts in the video signal before applying the 
space-time detection processing. This enhances the effect of the space-time detection 
processing. 

Space-time detail estimation and detection (STDE) is performed according to the above- 
35 described methods. Preferably the method includes calculation of visual norma! flow and it 
may further include calculation of visual normal acceleration. The necessary calculation 
means may be a dedicated video signal processors. Alternatively, since the amount of 
calculations needed with the methods according to the present invention signal processing 
may be implemented using signal processing power already present in the device, such as 
40 a TV set or a DVD player. 

Post-processing may include various per block statistical methods performed on statistical 
results for each of the blocks of the STDE part of the system of Fig. 3. The post-processinc 
may further include an integration in time of the statistical results for each of the blocks of 
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the STDE step of Fig- 3. In addition, the post-processing may comprise determining a 
pattern of temporal evolution of the per block statistics in time. This is necessary to 
determine which parts have a stable statistics. 

5 Using path A of Fig. 3 the video signal is stored after detection of space-time details. 
Preferably, the video signal is stored together with indexing information allowing further 
processing to be performed later. 

Alternatively, visual quality improvement means may be applied before storing, i.e. path B 
10 may be used. Visual quality improvement means may be provided to the signal so as to 
utilise the provided information regarding local regions of images containing a large 
amount of space-time details. For a non-compressed video signal this may be done by 
allocating, to blocks with space-time details, a larger data rate than would normally be 
allocated by standard coding schemes - for example by reducing the quantisation scale in 
15 I-frame and P-frame coding, to cope with higher levels of details. The signal may then be 
stored in an encoded version, however processed so as to eliminate or avoid visual 
artefacts. The video signal may be store without encoding but provided with indexing 
information indicating blocks or regions with space-time details thus enabling further 
processing such as later encoding or using the space-time index information as a search 
20 criterion. 

The last processing part of the system of Fig. 3 is a visual output, i.e. display, such as on a 
TV screen, a computer screen etc. Alternatively, the video signal may be applied to further 
devices or processors before being displayed or stored. 

25 

An application (i) of the principles according to the present invention is to eliminate or at 
least reduce visual artefacts in a video signal, such as the artefact blockiness or temporal 
flickering, by allocating more bits for blocks detected to exhibit space-time details. In some 
situations it may be preferred merely to obtain an indication of images/video regions which 
30 will contain probable visual artefacts, such as, blockiness, ringing, and mosquito "noise" for 
digitally (MPEG, H.26x) processed videos once encoded. 

Another application (ii) is to implement a low cost motion detection indicator for field 
insertion in de-interlacing for TV systems that can profit from a spatial sharpness 
35 improvement. This may be especially suitable for application within low cost de-interlacers, 
the principles according to the invention providing a partial motion compensation 
information. 

Yet another application (iii) is to detect, segment, index and retrieve image regions 
40 detected to exhibit space-time details in long video databases. In this way it may be 

possible to provide a search facility that allows a quick indexing of sequences of e.g. video 
films that contain waterfalls, ocean waves, hair/leaves/grass moving in the wind etc. 
Depending on which application is targeted, different processing blocks are used. 
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Yet, another possible application (iv) Is to perform selective sharpening, i.e. to adaptively 
change the spatial sharpness (peaking and clipping) to highlight selected regions of an 
image where a sharper image is desired, and to reduce the possibility of increasing the 
visibility of digital artefacts in regions that are de-selected. 

5 

For example, application (i) can be used in both visual quality improvements for display 
and storage applications. For display application path C in Fig. 5 is used. Display 
applications may be such as high quality TV sets. Detection and segmentation of space- 
time details is important due to the fact that visual artefacts can be eliminated or at least 
10 reduced by an appropriate allocation of bits in response to local/regional image 

characteristics, such as, a customised bit-rate control per 8x8 or 16x16 image blocks. This 
is important relating to visual artefacts because often by just detecting may be too late to 
reduce their visibility or effects on the visual quality of motion pictures when displayed. 

15 In storage applications path A or path B of Fig. 5 may be used. By using path A the video 
signal is stored prior to performing visual quality improvement. However, using path A 
may include detection and segmentation of space-time details and storage of indexing of 
regions, such as 8x8 or 16x16 pixel blocks, that contain a large amount of space-time 
details. In this way a long video databases (stored content) may be processed enabling a 

20 further process at a later stage. This is useful for content information that is highly detailec 
and for which no effective representation is known for content description. Video signals 
may be stored either compressed or uncompressed. By storing uncompressed data a later 
compression can be performed taking advantage of the stored index relating to local 
space-time details. 

25 

By using path B video signals are stored after being properly processed with respect to 
increasing visual quality based on the detected local space-time details. As mentioned, the 
visual quality improvement could be performed by allocating more data to blocks 
exhibiting a space-time details. Therefore, path B may also be used for processing large 
30 video databases. Using path B video signals can be stored compressed since a proper 
signal treatment has been carried out ensuring that a high visual quality regarding space- 
time details is obtained even by use of compression. 

Among a large amount of different devices or systems, parts of devices or systems, the 
35 principles according to the invention may be applied within TV systems, such as TV sets, 
and DVD+RW equipment, such as DVD players or DVD recorders. The proposed methods 
may be applied within digital (LCD, LCoS) TV sets where new types of digital artefacts 
occur and/or become more visible and thus requiring a generally high video signal quality. 

40 The principles of the present invention relating to visual quality improvement may be used 
also within wireless hand-held miniature devices featuring displays adapted for showing 
motion pictures. For example, a high visual quality of motion pictures on mobile phones 
with near to the eye displays can be combined with still a moderate data rate requirement. 
For devices with a quite poor spatial resolution the visual quality improvements according 
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to the invention may be used to reduce the required data rate for the video signal, and still 
without blockiness and related visual artefacts. 

In addition, the principles according to the invention may be applied within MPEG coding 
5 and decoding equipment. The methods may be applied within such encoders or decoders. 
Alternatively, separate video processor devices may be applied prior to existing encoders. 
The principles according to the invention may be applied within consumer equipment as 
well as within professional equipment. 

10 In an embodiment of a video signal encoder according to the Invention, a quantisation 
scale at the encoder side depending on space-time details information is applied. The 
quantisation scale is modulated by space-time details information. The smaller (larger) this 
scale the more (less) steps the quantizer has, and therefore more (less) spatial details is 
enhanced (blurred). Preferably, a video signal encoder according to the invention is 

15 capable of producing signal formats in accordance with MPEG or H.26x formats. 

In a preferred embodiment, a fixed quantisation scale per macroblock q _sc is used. A 
modulation is applied to q_sc , wherein the modulation using information about space- 
time details. For each macroblock the norma) flow (per pixel) and its average and variance 
20 <t Vb (per macroblock) are calculated. From experiments it is known that the normal flow 

variance has a histogram for which the Gamma (Erlang) function is a good Fit. With this 
knowledge, it Is possible to fit: 

M (x) -xx exp(-(x - 1)) 

25 (shifted Gamma (Erlang) function) to the histogram of o Vn . With this, the quantisation 
scale per macroblock becomes: 

9 _ se _ m = F{8 x q _ sc - A x M(ju^ )) , 

where F() represents the operations of rounding and table look-up, and S and A are real 
30 numbers (positive for 5 and positive and negative for A) that are adjusted according to 
an overall amount of bits preferred to assign per frame (video sequence). 

Fig. 4 shows an example of a histogram plotted for a sequence exhibiting image parts with 
a high amount of space-ume oeccms. me sequence processed is the sequence of a girl 

35 running in the foreground, while part of the background is the sea with water waves hitting 
rocks. The histogram-of-Rg- 4 shows-a-number of blocks as a function of normal flow 
variance. The white bars indicate flat areas, i.e. areas with a small amount of space-time 
details, e.g. the sky. The black bars indicate areas with a high amount of space-time 
details, e.g. water waves hitting the rocks. As seen from the histogram there is a good 

40 correlation between space-time details and normal flow variance, since bars representing 
areas with small amount of space-time details are grouped towards low normal flow 
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variance values, while bars representing high amount of space-time details are grouped 
towards high normal flow variance values. 

In the foregoing, and also with regard to the accompanying claims, it will be appreciated 
5 that expressions such as "incorporate", "contain", "include", "comprise", "is" and "have" 
are intended to be construed non-exclusively, namely other parts or components are 
potentially present which have not been explicitly specified. 
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Claims 

1. A method of detecting local space-time details of a video signal representing a 
plurality of images, the method comprising, for each image, the steps of: 

A) dividing the image into one or more blocks of pixels, 

5 B) calculating at least one space-time feature for at least one pixel within each of 

said one or more blocks, 

C) calculating for each of the one or more blocks at least one statistical parameter 
for each of the at least one space-time features calculated within the block, and 

D) detecting blocks wherein the at least one statistical parameter exceeds a 
10 predetermined level. 

2. A method according to Claim 1, wherein the at least one space-time feature is 
selected from a group consisting of: visual normal flow magnitude, visual normal flow 
direction. 

3. A method according to Claim 1, wherein the at least one space-time feature is 
15 selected from a group consisting of: visual normal acceleration magnitude, and visual 

normal acceleration direction. 

4. A method according to Claim 1, wherein the at least one statistical parameter of 
step D) is selected from a group consisting of: variance, average, and at least one 
parameter of a probability function. 

20 5. A method according to Claim 1, wherein the one or more blocks of pixels are one o 
more non-overlapping square blocks, and wherein a size of the one or more square blocks 
is selected from a group consisting of: 2x2 pixels, 4x4 pixels, 6x6 pixels, 8x8 pixels, 12x1* 
pixels, and 16x16 pixels. 

6. A method according to Claim 1, further comprising the step of pre-processing the 
25 image prior to applying step A), so as to reduce noise in the image. 

7. A method according to Claim 6, wherein the step of pre-processing comprises 
convolving the image with a low-pass filter. 

8. A method according to Claim 1, further comprising an intermediate step between 
step C) and D), wherein the intermediate step comprises calculating at least one inter- 

30 block statistical parameter involving at least one of the statistical parameter calculated for 
each block. 

9. A method according to claim 8, wherein the at least one inter-block statistical 
parameter is calculated using a 2-D Markovian non-causal neighbourhood structure. 
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10. A method according to Claim 1, further comprising the step of determining a 
pattern of temporal evolution for each of the at least one statistical parameter calculated in 
step C). 

11. A method according to Claim 1, further comprising the step of indexing at least 
5 part of an image comprising one or more blocks detected in step D). 

12. A method according to Claim 1, further comprising the steps of calculating 
horizontal and vertical histograms of the at least one space-time feature calculated in step 
C). 

13. A method according to Claim 1, further comprising the step of increasing data rate 
10 allocation to the one or more blocks detected in step D). 

14. A method according to Claim 1, further comprising the step of inserting an image in 
a de-interlacing system. 

15. A system for detecting local space-time details of a video signal representing a 
plurality of images, the system comprising: 

15 - means for dividing an image into one or more blocks of pixels, 

- space-time feature calculating means for calculating at least one space-time feature for 
at least one pixel within each of the one or more blocks, 

- statistical parameter calculating means for calculating for each of the one or more blocks 
at least one statistical parameter for each of the at least one space-time features 

20 computed within the one or more blocks, and 

- detecting means for detecting one or more blocks wherein the at least one statistical 
parameter exceeds a predetermined level. 

16. A device comprising a system according to Claim 15. 

17. A signal processor system programmed to operate according to the method of 
25 Claim 1. 

18. A de-interlacing system for a television (TV) apparatus, the de-interiacing system 
operating according to the method of Claim 1. 

30 19. A video signal encoder for encoding a video signal representing a plurality of 
images, the video signal encoder comprising: 

- means for dividing an image into one or more blocks of pixels, 

35 - space-time feature calculating means for calculating at least one space-time feature for 
at least one pixel within each of the one or more blocks, 
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- statistical parameter calculating means for calculating for each of the one or more block 
at least one statistical parameter for each of the at least one space-time features 
computed within the one or more blocks, 

5 

- means for allocating data to the one or more blocks according to a quantisation scale, 
and 

- means for adjusting the quantisation scale for the one or more blocks in accordance wit 
10 the at least one statistical parameter. 

20. A video signal representing a plurality of images, the video signal comprising 
information regarding image segments exhibiting space-time details suitable for use with 
the method of Claim 1. 

15 

21. A video storage medium comprising video signal data according to Claim 20. 

22. A computer useabte medium having a computer readable program code embodiec 
therein, the computer readable program code comprising: 

20 

- means for causing a computer to read a video signal representing a plurality of images, 

- means for causing the computer to divide a read image into one or more blocks of pixel; 

25 - means for causing the computer to calculate at least one space-time feature for at least 
one pixel within each block, 

- means for causing the computer to calculate for each of the blocks at least one statistics 
parameter for each of the at least one space-time features calculated within the one or 

30 more blocks, and 

- means for causing the computer to detect blocks wherein the at least one statistical 
parameter exceeds a predetermined level. 

35 23. A video signal representing a plurality of images, the video signal being 
compressed according to a video compression standard, such as MPEG or H.26x, 
comprising a specified individual allocation of data to blocks of each image, wherein a dat 
rate allocated to one or more selected blocks of images exhibiting space-time details is 
increased compared to the specified allocation of data to the one or more selected blocks. 

40 

24. A method of processing a video signal, wherein the method of processing 
comprises the method of Claim 1. 
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25. An integrated circuit comprising means for processing a video signal according to 
the method of Claim 1. 

26. A program storage device readable by a machine and encoding a program of 
5 instructions for executing the method of Claim 1. 
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Abstract 

The invention relates to video signal processing such as for TV or DVD signals. Methods 
and systems for detection and segmentation of local visual space-time details in video 
5 signals are described. Furthermore, a video signal encoder is described. The method 

described comprises the steps of dividing an image into blocks of pixels, calculating space- 
time feature(s) within each block, calculating statistical parameter(s) for each space-time 
feature(s), and detecting blocks wherein the statistical parameter(s) exceeds a 
predetermined level. Preferably, visual normal flow is used as a local space-time feature. 
10 In addition, visual normal acceleration may be used as space-time features. In preferred 
embodiments visual artefacts, such as blockiness, occurring by MPEG or H.26x encoding 
can be reduced by allocating a larger amount of bits to local image parts exhibiting a large 
amount of space-time details. 



Fig. 3 is to be published with the abstract. 
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